Audio and Visual Processing to Enhance User Interfaces
نویسنده
چکیده
SUMMARY This report details the work carried out between the months of October 1996 and June 1997, and the results so far achieved. Numerous speech and person recognition experiments have been performed using both speech and visual lip features. The discriminatory properties of audio and visual features are examined, along with the performance of two classiiers, namely VQ and DTW. The eeect of varying the amount of training data is observed and both static and dynamic features are investigated. The results for person recognition show that both the audio and visual features contain valuable speaker speciic information. For speech recognition, the audio signal, as might be expected produced easily the better results, when compared with the visual features. It was noted that for both speech and person recognition, increasing the amount of representative training data improved the recognition results. Also, the DTW type classiier performed marginally better than the VQ classiier for recognition, with only a few (unexplained) exceptions. It was observed that the audio dynamic features contained valuable speech and speaker idiosyncratic information, while the visual dynamic features, for both speech and person recognition contained very little discriminatory information. Possible avenues for future research related to enhancing a user interface are postulated.
منابع مشابه
Selecting and Extracting Effective Features of SSVEP-based Brain-Computer Interface
User interfaces are always one of the most important applied and study fields of information technology. The development and expansion of cognitive science studies and functionalization of its tools such as BCI1, as well as popularization of methods such as SSVEP2 to stimulate brain waves, have led to using these techniques every day, especially in appropriate solutions for physically and menta...
متن کاملSound Processing in Openmusic
This article introduces some new possibilities of audio manipulations and sound processing in the Computer-Aided Composition environment OpenMusic. Interfaces with underlying sound processing systems are described, with an emphasis on the use of the symbolic and visual programming environment for the design of sound computation processes.
متن کاملA New Trust Model for B2C E-Commerce Based on 3D User Interfaces
Lack of trust is one of the key bottle necks in e-commerce development. Nowadays many advanced technologies are trying to address the trust issues in e-commerce. One among them suggests using suitable user interfaces. This paper investigates the functionality and capabilities of 3D graphical user interfaces in regard to trust building in the customers of next generation of B2C e-commerce websit...
متن کاملListening to Unfamiliar Voices in Spatial Audio: Does Visualization of Spatial Position Enhance Voice Identification?
The use of spatial audio to present voices from unique locations around a listener's head has been demonstrated to enhance the perception and cognition of auditory events for a variety of listening tasks. This paper describes experimental research to determine whether a combination of spatialization and simple visual representation of voice locations further builds upon the known benefits of sp...
متن کاملAn Anatomy of Graph-Based User Interfaces for Media Processing
Graph-based user interfaces are employed in a variety of software such as audio synthesizers, video compositing tools, and database application builders. All of these uses afford the graphical metaphor of a graph: “Nodes” such as sound generators or filters are tied together by “links,” which may represent signal flow or conceptual relations. Focusing on media production tools, we have examined...
متن کامل